Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix initrand to avoid random number sequences overlapping #18744

Merged
merged 11 commits into from
Sep 2, 2021

Conversation

demotomohiro
Copy link
Contributor

@demotomohiro demotomohiro commented Aug 24, 2021

This is the alternative of #18149.
Fix #17898.
It allows 2^64 Rand generate 2^64 random numbers without random number sequences overlapping.

If states of PRNGs were initialized using time, multiple states of PRNG (pseudorandom number generator) can have same value when they are initialized at almost same time.
If states of PRNG were initialized with random value, sequences generated by them can be overlapped.
https://blogs.sas.com/content/iml/2018/05/07/overlap-random-streams.html
The probability of overlap of random subsequences of pseudorandom number generator is explained in this paper:
https://vigna.di.unimi.it/ftp/papers/overlap.pdf

According to the paper, the probability of overlap is always at most n^2 * L / P where n is a number of PRNG, L is a length of sequences and P is a period of the PRNG.

This PR implementates initRand in the same way as runnableExamples of skipRandomNumbers in random module.
It try to avoid a PRNG state identicals to a state of other PRNGs in other processes by initializing the baseState with std/sysrand.
When it doesn't worked, baseSeed is initialized with hash value of current process id and monotime.
But different processes in different machines can have same process id.

initRand avoids random number sequences overlapping by copying baseState to new Rand state and call skipRandomNumbers with baseState.

I didn't touched js code because how skipRandomNumbers works in js is unknown.
Even if it works and can skip 2^64 times next() calls, that means it is same to call next() once because maximum period of Rand in js is 2^64 - 1, because sizeof(Rand) == 8 in js.
I think it should be replaced with 32-bit Generators like xoshiro128++ or xoshiro128** if random module doesn't need to generate same sequences in all platform.
https://prng.di.unimi.it
Or implement 64bit int addition and bit operation with 32 bit int for correct xoroshiro128+.

I visualized how Pseudo-random number generators created with jump function(skipRandomNumbers) and with random seed use PRNG state space.

prng
Source:
https://gist.github.com/demotomohiro/9516fe686da376bbce8a4470e52752ab

Each blue box represents a PRNG state.
Red rounded box represents PRNG that generates 4 random numbers by updating state 4 times.
Upper block represents how PRNGs created with jump function uses PRNG states.
Lower block represents how PRNGs created with random seed uses PRNG states.
In lower block, several PRNGs uses same states.
That means a part of random number sequence from one PRNG matchs a part of random number sequence from other PRNG.

This document explains how this PR avoid random number sequences overlapping:
https://www.mathcha.io/editor/7JPgyt0WSLdH4L7VywudZkBrQF4qJOe8Fp3JBB

@timotheecour
Copy link
Member

timotheecour commented Aug 25, 2021

If states of PRNGs were initialized using time, multiple states of PRNG (pseudorandom number generator) can have same value when they are initialized at almost same time.

that's not true anymore, I fixed this in #17468 which already calls skipRandomNumbers(result) in the right place; it's unclear whether your PR adds redundant calls to skipRandomNumbers given that skipRandomNumbers is already being called by initRand(seed)

@demotomohiro
Copy link
Contributor Author

If states of PRNGs were initialized using time, multiple states of PRNG (pseudorandom number generator) can have same value when they are initialized at almost same time.

that's not true anymore, I fixed this in #17468 which already calls skipRandomNumbers(result) in the right place; it's unclear whether your PR adds redundant calls to skipRandomNumbers given that skipRandomNumbers is already being called by initRand(seed)

initRand*(seed: int64) is a function because it always returns same value from same input.
If initRand*(seed: int64): Rand was called with times.getTime() twice and first and second times.getTime() call returned the same value as first times.getTime(), two Rand returned from initRand*() would have same status.

In this PR, one process have only one baseState that is initialized on start up with random number from sysrand.urandom or monotime + process Id.
After baseState was initialized, it is updated only with skipRandomNumbers in initRand().
When initRand() is called, baseState is copied to new Rand and baseState is updated with skipRandomNumbers.
initRand() doesn't take random value from time or sysrand and doesn't call initRand*(seed: int64) except js backend.
So When initRand() is called multiple times, states of Rand returned from initRand() doesn't overlap to each other unless it calls next*(r: var Rand): uint64 more than 2^64 times or initRand is called more than 2^64 times.

This document explains how this PR avoid random number sequences overlapping:
https://www.mathcha.io/editor/7JPgyt0WSLdH4L7VywudZkBrQF4qJOe8Fp3JBB

lib/pure/random.nim Outdated Show resolved Hide resolved
lib/pure/random.nim Outdated Show resolved Hide resolved
ret = DefaultRandSeed
ret

when compileOption("threads"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to have the default random state be a per-thread global? That way, locking wouldn't be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

baseState cannot be a per-thread global because if it was per-thread, it cannot avoid random number sequences overlapping between different PRNGs in different thread.

Quote from https://prng.di.unimi.it (web site created by creator of xoroshiro):

We provide ready-made jump functions for a number of calls equal to the square root of the period, to make it easy generating non-overlapping sequences for parallel computations, and equal to the cube of the fourth root of the period, to make it possible to generate independent sequences on different parallel processors.

skipRandomNumbers*(s: var Rand) in random module is a jump function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow - if per-thread state is initialized with random data via urandom, why would skipRandomNumbers overlap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please read this:
https://www.mathcha.io/editor/7JPgyt0WSLdH4L7VywudZkBrQF4qJOe8Fp3JBB

Initial state of all Rand must be T2^64n (n is natural number including 0) to avoid overlapping.
If baseState were per-thread variable and they were initialized with urandom, initial state of all Rand cannot become T2^64n.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@demotomohiro ...so now every call to next needs to hold the lock! I imagine this would be very slow. Is there a better solution than ignoring the data races?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timotheecour The approach in #18149 would suffer from overlap issues though, wouldn't it? Which is what I believe @demotomohiro is aiming to avoid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach in #18149 would suffer from overlap issues though, wouldn't it?

I was also ensuring skipRandomNumbers gets called but indeed, the way it was being called wouldn't guarantee lack of overlap ; i'll need to think more to double check;

that said, the potential performance issue should be evaluated/investigated

Copy link
Contributor

@Varriount Varriount Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, upon re-reviewing the code:

  • The lock is only for initialization of new random states (frombasestate), including the global state. I had previously misread this, and thought all access to the global random state (state) was being locked.
  • The current global random state isn't threadsafe at all (unless the implementation is a lock-free algorithm of some sort), and isn't a per-thread global. An obvious way to solve this to make state a thread-local global, and check/initialize it when it is accessed. This would impose a performance cost, though I don't know how small it would be - it would probably depend on whether the backend compiler is smart enough to hoist the random state somewhere on the stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@planetis-m
As @Varriount said, only initRand() proc acquires lock.
If you create Rand in each threads and never share them between threads, you don't need to lock when you call next(r: var Rand).
If you create only one Rand and want to share it by multiple threads, you would need to acquire lock when you call next(r: var Rand),
but it is not a problem this PR cause.

@Varriount
In this document, T0 is a initial state of baseState and it is the value came from sysrand.urandom.
https://www.mathcha.io/editor/7JPgyt0WSLdH4L7VywudZkBrQF4qJOe8Fp3JBB
Then, first Rand returned from initRand() has state T0.
second one has T2^64, third one has T2^642, 4th one has T2^643, n-th one has T2^64*(n-1) and so on.
Then, first Rand returned from initRand() can use states from T0 to T2^64-1 without overlapping the state with second one.
Second one can use from T2^64 to T2^642-1, third one can use from T2^642 to T2^64*3-1, and so on.

@timotheecour
I don't think performance of initRand() is so important, because it would not be called as frequently as next(r: var Rand) as initRand() is called only when you create a new Rand.
When many Rand are created but they generate few random numbers, performance of initRand() can be important.
But in that case, I think hash functions tested in paper would be better:
http://jcgt.org/published/0009/03/02/

An obvious way to solve this to make state a thread-local global, and check/initialize it when it is accessed.

I think making state a thread-local variable and Nim compiler automatically inserts a call to randomize() so that it is called when a new thread starts would be better.
But I don't know it is possible.

Copy link
Contributor

@planetis-m planetis-m Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it now, good one +1. The key thing to remember is state and baseState are different variables. state is used only for the "simple" not thread safe api, while baseState is used in the new initRand overload.

lib/pure/random.nim Outdated Show resolved Hide resolved
Comment on lines 646 to 662
for i in 0 .. 7:
if sysrand.urandom(urand):
copyMem(ret.addr, urand[0].addr, sizeof(Rand))
if ret.isValid:
break

if not ret.isValid:
# When 2 processes executed at same time on different machines,
# `ret` can still have same value.
let
pid = getCurrentProcessId()
t = getMonoTime().ticks
ret.a0 = pid.hash().Ui
ret.a1 = t.hash().Ui
if not ret.isValid:
ret.a0 = pid.Ui
ret.a1 = t.Ui
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would the data supplied by sysrand.urandom not correctly initialize the global random state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xoroshiro128+ doesn't work when all bits of state are zero.
If sysrand.urandom is not guaranteed to never return zeros when sizeof(Rand) bytes are taken from it, we need to check returned value.
Maybe there is a bug in OS and sysrand.urandom returns zero.
Even if sysrand.urandom is working correctly, you might get a zero unfortunately (probability of 128bits random number returns 0 is 1/2^128 ).

According to the comment of sysrand.urandom proc,

If the call succeeds, returns true.

In other words, it can fail. So we need alternative ways to get random number.
Even if probability of sysrand.urandom returns 128bit zeros is tiny, it is easy to take care of such case if we already have alternative ways to get random number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about ANDing the state with 1 after seeding via urandom? That would ensure that the state is never zero.
Alternately, an explicit check and retry would work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isValid(r: Rand): bool func checks whether state is zero or not.

func isValid(r: Rand): bool {.inline.} =
  not (r.a0 == 0 and r.a1 == 0)

This PR retrys urandom if isValid was false.

lib/pure/random.nim Outdated Show resolved Hide resolved
lib/pure/random.nim Outdated Show resolved Hide resolved
@@ -635,20 +653,60 @@ when not defined(nimscript) and not defined(standalone):
let time = int64(times.epochTime() * 1000) and 0x7fff_ffff
result = initRand(time)
else:
let now = times.getTime()
result = initRand(convert(Seconds, Nanoseconds, now.toUnix) + now.nanosecond)
proc getRandomState(): Rand =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this nested proc to top-level scope, simpler/cleaner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why moving nested proc to top-level scope makes simpler/cleaner?
It doesn't reduce a total number of lines of random module.
getRandomState proc is used only in initRand().

for i in 0..<size:
for j in 0..<numRepeat:
discard rands[i].next
doAssert rands[i] notin randSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be addressed in a follow up PR but the notion that a random sequence of numbers doesn't "overlap" with some other sequence seems to weaken the "randomness" and not to improve it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think think this is correct; the sequences of numbers generated can have overlap in parts, eg:
1 (9 8 3) 2 5
2 (9 8 3) 7 1

but the underlying states won't overlap (so long we remain within a period); if they did, the sequences generated would be identical at any point after which the state overlaps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell me why is it weaken randomness?
It just works like evenly cut a random number sequence generated by calling next(r: var Rand) with single Rand, and assign each sub sequence to each Rand.
If you think randomness of random module is not good enough, I think we need to use different algorithm like xoroshiro128++, xoroshiro128** or PCG.

@Araq Araq merged commit 7c8ea49 into nim-lang:devel Sep 2, 2021
PMunch pushed a commit to PMunch/Nim that referenced this pull request Mar 28, 2022
…8744)

* Fix initrand to avoid random number sequences overlapping

* Minor fix

* Fix compile error on js backend

* Disable new test for js backend

* Minor fix

* tempfiles module uses random.initRand()

* Remove unused module import from lib/std/tempfiles.nim

* Initialize baseState in initRand()

* Run tests/stdlib/trandom.nim from tests/test_nimscript.nims

* baseState is initialized only with sysrand.urandom and quit if failed

* Add comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

randomPathName called twice in a row can return the same string on windows
6 participants